Experimental postgres query (PR 3/4): multi-input + error formatting#5138
Open
simonfaltum wants to merge 13 commits intosimonfaltum/postgres-query-pr2-streamingfrom
Open
Experimental postgres query (PR 3/4): multi-input + error formatting#5138simonfaltum wants to merge 13 commits intosimonfaltum/postgres-query-pr2-streamingfrom
simonfaltum wants to merge 13 commits intosimonfaltum/postgres-query-pr2-streamingfrom
Conversation
This is PR 3 of the experimental postgres query stack. Adds the rest
of the input ergonomics promised in the plan and the error-formatting
polish.
Inputs: positional args become variadic, --file is repeatable, stdin
is read when neither is present, and a positional ending in '.sql'
that exists on disk is treated as a SQL file. Execution order is
files-first then positionals (cobra/pflag does not preserve interleaved
spelling, documented in --help).
Each input unit must contain exactly one statement. checkSingleStatement
walks the SQL with a hand-written conservative scanner that ignores
';' inside single-quoted strings, double-quoted identifiers, line
comments, block comments, and dollar-quoted bodies. Multi-statement
strings are rejected before connect with a hint pointing at the
multi-input alternatives.
Multi-input output:
- text: each per-unit result rendered inline, separated by a blank
line (mirrors psql's compact text shape).
- json: top-level array of per-unit result objects with shape
{"sql","kind","elapsed_ms",...}; rows-producing units carry a
"rows":[...] array, command-only carry "command"+"rows_affected".
Each per-unit object is buffered to completion before write; the
outer array streams across units. The plan accepts this trade-off:
huge SELECTs in multi-input invocations buffer.
- csv: rejected pre-flight when N>1 (no sensible cross-schema shape).
Single-input csv keeps streaming.
Per-unit errors render as a {"kind":"error", ...} entry in the JSON
shape so scripts can detect failure without checking exit code.
Sequential execution stops on the first failing unit; the successful
prefix is rendered.
formatPgError renders *pgconn.PgError with SEVERITY, SQLSTATE, DETAIL,
HINT inline. Non-PgError values pass through unchanged so connect-time
errors keep their original wording.
Single-input keeps the streaming sinks from PR 2; only multi-input goes
through the buffered renderer. Session state (SET, temp tables) carries
across input units because they share one connection.
TUI for >30 rows is deferred to a follow-up. The current text path uses
the static tabwriter table for both single- and multi-input.
Co-authored-by: Isaac
MUSTs:
- Multi-input JSON error envelope: thread the failing *unitResult into
renderJSONMulti so source/sql/elapsed_ms reflect the actual failing
input instead of empty strings.
- Canonical key order for every per-unit object:
{"source", "sql", "kind", "elapsed_ms", payload}
Success and error envelopes now share the same shape so consumers
don't have to special-case kind=="error" for missing fields.
SHOULDs:
- Single-input path now goes through formatPgError, so DETAIL/HINT
surface consistently across single- and multi-input.
- runUnitBuffered reuses executeOne via a new bufferSink. The two
query loops collapse to one; future error-handling changes auto-
propagate.
- Scanner: reject `$<digit>...` as a dollar-quote tag (PG docs forbid
digit-leading tags). Pinned with a test for `SELECT $1, $2 FROM t`
and `SELECT $1 FROM t; SELECT 2`.
- Pin the E-string over-rejection behaviour with a test, so a future
scanner improvement has to update the assertion.
CONSIDERs:
- Capture elapsed_ms on the error path too (was previously discarded).
- Promote multiStatementHint to a const.
- Drop jsonEscapeShort (was a fragile micro-opt for an always-ASCII
domain); use marshalJSON for the command verb instead.
- Add TestRenderJSONMulti_FirstUnitFails to pin the empty-success-
prefix framing.
Co-authored-by: Isaac
Round-2 reviewer noted jsonErrorObject's defensive branches around writeJSONUnitHeader/marshalJSON are unreachable (encoding/json doesn't error on string inputs), and the repo rule says drop "just in case" fallbacks. Replace with panic-on-impossible helpers. Co-authored-by: Isaac
…tum/postgres-query-pr3-multi-input # Conflicts: # acceptance/cmd/experimental/postgres/query/argument-errors/output.txt # acceptance/cmd/experimental/postgres/query/argument-errors/script
…tum/postgres-query-pr3-multi-input # Conflicts: # experimental/postgres/cmd/query.go
Contributor
|
Invalid dollar-quote tag character validation allows multi-statement SQL to bypass the single-statement guard
🔍 Reviewed by nitpicker |
…tum/postgres-query-pr3-multi-input # Conflicts: # experimental/postgres/cmd/query.go
arsenyinfo
approved these changes
Apr 30, 2026
…tum/postgres-query-pr3-multi-input
…treaming' into simonfaltum/postgres-query-pr3-multi-input
…treaming' into simonfaltum/postgres-query-pr3-multi-input
bufferSink.Begin stashed the FieldDescription slice it was handed. pgx reuses that slice's backing array across queries on the same connection (pgConn.fieldDescriptions is a fixed-size buffer that's re-sliced per statement), so each buffered unit's Fields ended up pointing at the LAST query's row description. The multi-input renderers then emitted the wrong column names for every unit but the last. Clone the slice so each buffered unit owns its column descriptions. Co-authored-by: Isaac
…treaming' into simonfaltum/postgres-query-pr3-multi-input
…treaming' into simonfaltum/postgres-query-pr3-multi-input
simonfaltum
added a commit
that referenced
this pull request
May 5, 2026
## PR Stack 1. **PR 1 (this PR)** — [#5135](#5135) — scaffold + autoscaling targeting + text output 2. [#5136](#5136) — PR 2: provisioned + JSON/CSV streaming + typed values 3. [#5138](#5138) — PR 3: multi-input + multi-statement rejection + error formatting 4. [#5143](#5143) — PR 4: cancellation + timeout + TUI ## Why Talking to Lakebase Postgres from a script today goes through one of two unpleasant paths: 1. **Shell out to `databricks psql -- -c "SQL"`.** Works on macOS / Linux when psql is installed. Fails on Windows 11 by default and on minimal containers / sandboxed CI. No JSON / CSV. 2. **Write Python with `psycopg`.** Forces every consumer to manage OAuth tokens, SSL mode, autocommit, etc. This series adds a third path: a native CLI command that runs SQL against any Lakebase endpoint, returns results in text/JSON/CSV (later PRs), and works without a system psql. `databricks psql` keeps owning the interactive REPL surface; this PR does **not** touch psql. ## Changes **Before:** No CLI command runs SQL against Lakebase from Go. Users either shell out to `psql` (requires the system binary) or write `psycopg` glue. **Now:** `databricks experimental postgres query --target projects/foo/branches/main/endpoints/primary "SELECT 1"` returns a text-rendered result. Provisioned instances and richer output formats land in follow-up PRs. The experimental command is fully contained under `experimental/postgres/cmd/`: - `experimental/postgres/cmd/cmd.go`, `query.go`, `targeting.go`, `connect.go`, `execute.go`, `render.go` — command implementation. - `experimental/postgres/cmd/internal/target/` — Lakebase target resolution helpers (parsing, SDK wrappers, auto-select-when-exactly-one). Internal sub-package so it can't accidentally be imported from outside the experiment. When/if this command graduates from experimental, that's the right time to consider extracting to `libs/`. Single positional SQL, autoscaling targeting only (`--target`, `--project`, `--branch`, `--endpoint`), `--max-retries`, `--connect-timeout`, `--database`. Driver is `github.com/jackc/pgx/v5 v5.9.1` (MIT). Connect retry uses a typed predicate (08xxx SQLSTATE family + `57P03` cannot_connect_now + `net.OpError` with `Op == "dial"`); auth (28xxx) and permission (42501) errors do not retry. Text rendering is buffered (no streaming yet); rows-producing vs command-only is decided at runtime via `FieldDescriptions()`. Outside the experimental tree, this PR only: - Registers the command in `cmd/experimental/experimental.go` (2 lines). - Adds the pgx direct dependency (`go.mod` SPDX annotation, `NOTICE` entry, `NEXT_CHANGELOG.md` dependency-updates entry). `pgx` is already a direct dep of the universe monorepo's Lakebase services; aligning here keeps the SDK surface consistent. ## Test plan - [x] `go test ./experimental/postgres/...` (target parsing, validateTargeting, retry classification, render) - [x] `go test ./internal/build/...` (license + NOTICE completeness) - [x] `go tool ... golangci-lint run ./experimental/postgres/...` (0 issues) - [x] `./task checks` (whitespace, links, deadcode)
…treaming' into simonfaltum/postgres-query-pr3-multi-input
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Stack
sqlcli.ResolveFormatsqlcli.CollectStacked on PR 2.
Why
PR 2 shipped a single-statement, single-input command. Real workflows want multi-input (set-then-query, file-then-stdin), multi-statement rejection with a friendly hint, and rich pg error formatting.
This PR also extends
experimental/libs/sqlcliwith input-collection logic shared by aitools and postgres. Same architectural principle as PR 2: instead of postgres growing its own duplicate of aitools' resolveSQLs, both commands now callsqlcli.Collect.Changes
Architectural:
experimental/libs/sqlcli/input.goadds:sqlcli.SQLFileExtensionconst (.sql).sqlcli.Input{SQL, Source}type — Source is the human-readable origin label ("--file PATH", "argv[N]", "stdin").sqlcli.CollectOptions{Cleaner func(string) string}— aitools passes itscleanSQL(strips comments+quotes); postgres passes the defaultTrimSpacebecause its multi-statement scanner needs comments preserved.sqlcli.Collect— files-first then positionals, stdin only when neither is present, .sql autodetect on positionals.aitools' resolveSQLs collapses to a thin wrapper around sqlcli.Collect (drops the SQL strings, ignores Source). The "SQL statement #N is empty after removing comments" wording is replaced with sqlcli's
argv[N] is empty; aitools tests updated.User-facing changes for postgres query:
--file+ stdin fallback.;inside string literals, identifiers, line/block comments, and dollar-quoted bodies; tag must be a valid unquoted identifier so$1and$foo-bar$are correctly NOT treated as tags).{"source","sql","kind","elapsed_ms",...}for json; csv rejected pre-flight when N>1.SEVERITY: message (SQLSTATE XXXXX)with DETAIL/HINT lines), applied on both single-input and multi-input paths.Single-input keeps streaming.
runUnitBufferedis a thin wrapper aroundexecuteOne+ abufferSink, so the row-loop and error-wrapping logic stays in one place.Test plan
go test ./experimental/...(multistatement scanner: 28 cases including dollar-tag punctuation rejection, sqlcli.Collect: 12 cases including a custom-cleaner test, error formatting, multi-input renderers including byte-equal canonical-shape and first-unit-fails framing)go tool ... golangci-lint run ./experimental/...(0 issues)